The effects of analysing cohesion on document summarisation

نویسندگان

  • Branimir Boguraev
  • Mary S. Neff
چکیده

We argue that in general, the analysis of lexical cohesion factors in a document can drive a summarizer, as well as enable other content characterization tasks. More narrowly, this paper focuses on how one particular cohesion factor—simple lexical repetition—can enhance an existing sentence extraction summarizer, by enabling strategies for overcoming some particularly jarring enduser effects in the summaries, typically due to coherence degradation, readability deterioration, and topical under-representation. Lexical repetition is instrumental to, among other things, the topical make-up of a text, and in our framework a lexical repetition-based model of discourse segmentation, capable of detecting topic shifts, is integrated with a linguistically-aware summarizer utilizing notions of salience and dynamically-adjustable summary size. We show that even by leveraging lexical repetition alone, summaries are of comparable, and under certain conditions better, quality than the ones delivered by a state-of-the-art summarizer. This is encouraging for a broad research platform focusing on the recognition and use of cohesive devices in text for a range of content characterisation and document management tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structure-preserving and query-biased document summarisation for web searching

Purpose – The purpose of this paper is to develop a new summarisation approach, namely structure-preserving and query-biased summarisation, to improve the effectiveness of web searching. During web searching, one aid for users is the document summaries provided in the search results. However, the summaries provided by current search engines have limitations in directing users to relevant docume...

متن کامل

A task-oriented study on the influencing effects of query-biased summarisation in web searching

The aim of the work described in this paper is to evaluate the influencing effects of query-biased summaries in web searching. For this purpose, a summarisation system has been developed, and a summary tailored to the user s query is generated automatically for each document retrieved. The system aims to provide both a better means of assessing document relevance than titles or abstracts typica...

متن کامل

Cohesion Grading Decisions in a Summary Evaluation Environment: A Machine Learning Approach

The work presented in this paper has been carried out in the context of a summary writing environment provided with automatic grading. Regarding summarisation discourse, some of the most relevant variables identified in previous work are comprehension, adequacy, use of language, coherence, and cohesion. This work is focused on cohesion. The described exploratory study starts from basic automati...

متن کامل

The effects of analysing cohesion on document summarization

This paper describes a framework for multi-document summarization which combines three premises: coherent themes can be identified reliably; highly representative themes, running across subsets of the document collection, can function as multi-document summary surrogates; and effective end-use of such themes should be facilitated by a visualization environment which clarifies the relationship b...

متن کامل

POLIS : a probabilistic summarisation logic for structured documents

As the availability of structured documents, formatted in markup languages such as SGML, RDF, or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements, rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval logics have allowed developers to include search facilities into numerous applications, without the need of h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000